The Shift-add Approach to String Matching

نویسنده

  • Gaston H. Gonnet
چکیده

S t r i n g s e a r c h i n g is a v e r y i m p o r t a n t c o m p o n e n t o f m a n y p r o b l e m s , i n c l u d i n g t ex t e d i t i n g , b i b l i o g r a p h i c r e t r i eva l , a n d s y m b o l m a n i p u l a t i o n . R e c e n t s u r v e y s o f s t r i n g s e a r c h i n g c a n be f o u n d in [4, 18]. T h e s t r i n g m a t c h i n g p r o b l e m cons i s t s o f f i n d i n g all o c c u r r ences o f a p a t t e r n o f l e n g t h m in a t ex t o f l e n g t h n. We generalize the p r o b l e m a l l o w i n g d o n ' t care s y m b o l s , t he c o m p l e m e n t o f a s y m b o l , and any f in i t e class o f s y m b o l s . We solve th i s p r o b l e m fo r o n e o r m o r e p a t t e r n s , w i t h o r w i t h o u t m i s m a t c h e s . F o r sma l l patterns the w o r s t c a s e t i m e is linear i n t he size o f the t ex t (we say t h a t a p a t t e r n is sma l l i f m is b o u n d e d b y a c o n s t a n t ) .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Two-Way Bit-parallel Search

New bit-parallel algorithms for exact and approximate string matching are introduced. TSO is a two-way Shift-Or algorithm, TSA is a two-way Shift-And algorithm, and TSAdd is a two-way Shift-Add algorithm. Tuned Shift-Add is a minimalist improvement to the original Shift-Add algorithm. TSO and TSA are for exact string matching, while TSAdd and tuned Shift-Add are for approximate string matching ...

متن کامل

Bit-parallel string matching under Hamming distance in O(n[m/w]) worst case time

Given two strings, a pattern P of length m and a text T of length n over some alphabet Σ, we consider the string matching problem under k mismatches. The well– known Shift-Add algorithm (Baeza-Yates and Gonnet, 1992) solves the problem in O(ndm log(k)/we) worst case time, where w is the number of bits in a computer word. We present two algorithms that improve this result to O(ndm log log(k)/we)...

متن کامل

Boyer-Moore Strategy to Efficient Approximate String Matching

We propose a simple but eecient algorithm for searching all occurrences of a pattern or a class of patterns (length m) in a text (length n) with at most k mismatches. This algorithm relies on the Shift-Add algorithm of Baeza-Yates and Gonnet 6], which involves representing by a bit number the current state of the search and uses the ability of programming languages to handle bit words. State re...

متن کامل

A fast implementation of the Boyer–Moore string matching algorithm

String matching is the problem of finding all the occurrences of a pattern in a text. We present a new method to compute a combinatorial shift function (“best matching shift”) of the well-known Boyer–Moore string matching algorithm. Moreover we conduct experiments showing that the algorithm using this best matching shift is the most efficient in particular cases such as the search for patterns ...

متن کامل

Improved Approach for Exact Pattern Matching

In this research we present Bidirectional exact pattern matching algorithm [20] in detail. Bidirectional (BD) exact pattern matching (EPM) introduced a new idea to compare pattern with Selected Text Window (STW) of text string by using two pointers (right and left) simultaneously in searching phase. In preprocessing phase Bidirectional EPM algorithm improved the shift decision by comparing righ...

متن کامل

The Shift-Match Number and String Matching Probabilities for Binary Sequences

Abstract We define the “shift-match number” for a binary string and we compute the probability of occurrence of a given string as a subsequence in longer strings in terms of its shift-match number. We thus prove that the string matching probabilities depend not only on the length of shorter strings, but also on the equivalence class of the shorter string determined by its shift-match number. PA...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000